25 research outputs found

    Unsupervised ensemble minority clustering

    Get PDF
    Cluster a alysis lies at the core of most unsupervised learning tasks. However, the majority of clustering algorithms depend on the all-in assumption, in which all objects belong to some cluster, and perform poorly on minority clustering tasks, in which a small fraction of signal data stands against a majority of noise. The approaches proposed so far for minority clustering are supervised: they require the number and distribution of the foreground and background clusters. In supervised learning and all-in clustering, combination methods have been successfully applied to obtain distribution-free learners, even from the output of weak individual algorithms. In this report, we present a novel ensemble minority clustering algorithm, Ewocs, suitable for weak clustering combination, and provide a theoretical proof of its properties under a loose set of constraints. The validity of the assumptions used in the proof is empirically assessed using a collection of synthetic datasets.Preprin

    Non-parametric document clustering by ensemble methods

    Get PDF
    Los sesgos de los algoritmos individuales para clustering no paramétrico de documentos pueden conducir a soluciones no óptimas. Los métodos de consenso podrían compensar esta limitación, pero no han sido probados sobre colecciones de documentos. Este artículo presenta una comparación de estrategias para clustering no paramétrico de documentos por consenso. / The biases of individual algorithms for non-parametric document clustering can lead to non-optimal solutions. Ensemble clustering methods may overcome this limitation, but have not been applied to document collections. This paper presents a comparison of strategies for non-parametric document ensemble clustering.Peer ReviewedPostprint (published version

    Unsupervised document clustering by weighted combination

    Get PDF
    This report proposes a novel unsupervised document clustering approach based on weighted combination of individual clusterings. Two non-weighted combination methods are adapted to work in a weighted fashion: a graph based method and a probability based one. The performance of the weighted approach is evaluated on real-world collections, and compared to that of individual clustering and non-weighted combination. The results of this evaluation confirm that graph based weighted combination consistently outperforms the other approaches.Postprint (published version

    ParTes. Test suite for parsing evaluation

    Get PDF
    This paper presents ParTes, the first test suite in Spanish and Catalan for parsing qualitative evaluation. This resource is a hierarchical test suite of the representative syntactic structure and argument order phenomena. ParTes proposes a simplification of the qualitative evaluation by contributing to the automatization of this task. © 2014 Sociedad Española para el Procesamiento del Lenguaje Natural.Postprint (published version

    TALP-UPC at TREC 2005: Experiments using voting scheme among three heterogeneous QA systems

    Get PDF
    This paper describes the experiments of the TALP-UPC group for factoid and ’other’ (definitional) questions at TREC 2005 Main Question Answering (QA)task. Our current approach for factoid questions is based on a voting scheme among three QA systems: TALP-QA (our previous QA system), Sibyl (a new QA system developed at DAMA-UPC and TALP-UPC), and Aranea (a web-based data-driven approach). For defitional questions, we used two different systems: the TALP-QA Definitional system and LCSUM (a Summarization-based system). Our results for factoid questions indicate that the voting strategy improves the accuracy from 7.5% to 17.1%. While these numbers are low (due to technical problems in the Answer Extraction phase of TALP-QA system) they indicate that voting is a succesful approach for performance boosting of QA systems. The answer to definitional questions is produced by selecting phrases using set of patterns associated with definitions. Its results are 17.2% of F-score in the best configuration of TALP-QA Definitional system.Postprint (published version

    The TALP participation at TAC-KBP 2012

    Get PDF
    This document describes the work performed by the Universitat Politècnica de Catalunya (UPC) in its first participation at TAC-KBP 2012 in both the Entity Linking and the Slot Filling tasks.Peer ReviewedPostprint (author’s final draft

    Unsupervised ensemble minority clustering

    No full text
    Cluster a alysis lies at the core of most unsupervised learning tasks. However, the majority of clustering algorithms depend on the all-in assumption, in which all objects belong to some cluster, and perform poorly on minority clustering tasks, in which a small fraction of signal data stands against a majority of noise. The approaches proposed so far for minority clustering are supervised: they require the number and distribution of the foreground and background clusters. In supervised learning and all-in clustering, combination methods have been successfully applied to obtain distribution-free learners, even from the output of weak individual algorithms. In this report, we present a novel ensemble minority clustering algorithm, Ewocs, suitable for weak clustering combination, and provide a theoretical proof of its properties under a loose set of constraints. The validity of the assumptions used in the proof is empirically assessed using a collection of synthetic datasets

    Unsupervised document clustering by weighted combination

    No full text
    This report proposes a novel unsupervised document clustering approach based on weighted combination of individual clusterings. Two non-weighted combination methods are adapted to work in a weighted fashion: a graph based method and a probability based one. The performance of the weighted approach is evaluated on real-world collections, and compared to that of individual clustering and non-weighted combination. The results of this evaluation confirm that graph based weighted combination consistently outperforms the other approaches

    Discounted functionals of Markov processes

    Get PDF
    SIGLEAvailable from British Library Document Supply Centre-DSC:D063340 / BLDSC - British Library Document Supply CentreGBUnited Kingdo

    Non-parametric document clustering by ensemble methods

    No full text
    Los sesgos de los algoritmos individuales para clustering no paramétrico de documentos pueden conducir a soluciones no óptimas. Los métodos de consenso podrían compensar esta limitación, pero no han sido probados sobre colecciones de documentos. Este artículo presenta una comparación de estrategias para clustering no paramétrico de documentos por consenso. / The biases of individual algorithms for non-parametric document clustering can lead to non-optimal solutions. Ensemble clustering methods may overcome this limitation, but have not been applied to document collections. This paper presents a comparison of strategies for non-parametric document ensemble clustering.Peer Reviewe
    corecore